58 research outputs found

    Scalable Loss-calibrated Bayesian Decision Theory and Preference Learning

    Get PDF
    Bayesian decision theory provides a framework for optimal action selection under uncertainty given a utility function over actions and world states and a distribution over world states. The application of Bayesian decision theory in practice is often limited by two problems: (1) in application domains such as recommendation, the true utility function of a user is a priori unknown and must be learned from user interactions; and (2) computing expected utilities under complex state distributions and (potentially uncertain) utility functions is often computationally expensive and requires tractable approximations. In this thesis, we aim to address both of these problems. For (1), we take a Bayesian non-parametric approach to utility function modeling and learning. In our first contribution, we exploit community structure prevalent in collective user preferences using a Dirichlet Process mixture of Gaussian Processes (GPs). In our second contribution, we take the underlying GP preference model of the first contribution and show how to jointly address both (1) and (2) by sparsifying the GP model in order to preserve optimal decisions while ensuring tractable expected utility computations. In our third and final contribution, we directly address (2) in a Monte Carlo framework by deriving an optimal loss-calibrated importance sampling distribution and show how it can be extended to uncertain utility representations developed in the previous contributions. Our empirical evaluations in various applications including multiple preference learning problems using synthetic and real user data and robotics decision-making scenarios derived from actual occupancy grid maps demonstrate the effectiveness of the theoretical foundations laid in this thesis and pave the way for future advances that address important practical problems at the intersection of Bayesian decision theory and scalable machine learning

    Selective Mixup Helps with Distribution Shifts, But Not (Only) because of Mixup

    Full text link
    Mixup is a highly successful technique to improve generalization of neural networks by augmenting the training data with combinations of random pairs. Selective mixup is a family of methods that apply mixup to specific pairs, e.g. only combining examples across classes or domains. These methods have claimed remarkable improvements on benchmarks with distribution shifts, but their mechanisms and limitations remain poorly understood. We examine an overlooked aspect of selective mixup that explains its success in a completely new light. We find that the non-random selection of pairs affects the training distribution and improve generalization by means completely unrelated to the mixing. For example in binary classification, mixup across classes implicitly resamples the data for a uniform class distribution - a classical solution to label shift. We show empirically that this implicit resampling explains much of the improvements in prior work. Theoretically, these results rely on a regression toward the mean, an accidental property that we identify in several datasets. We have found a new equivalence between two successful methods: selective mixup and resampling. We identify limits of the former, confirm the effectiveness of the latter, and find better combinations of their respective benefits

    Learning And Optimization Of The Kernel Functions From Insufficiently Labeled Data

    Get PDF
    Amongst all the machine learning techniques, kernel methods are increasingly becoming popular due to their efficiency, accuracy and ability to handle high-dimensional data. The fundamental problem related to these learning techniques is the selection of the kernel function. Therefore, learning the kernel as a procedure in which the kernel function is selected for a particular dataset is highly important. In this thesis, two approaches to learn the kernel function are proposed: transferred learning of the kernel and an unsupervised approach to learn the kernel. The first approach uses transferred knowledge from unlabeled data to cope with situations where training examples are scarce. Unlabeled data is used in conjunction with labeled data to construct an optimized kernel using Fisher discriminant analysis and maximum mean discrepancy. The accuracy of classification which indicates the number of correctly predicted test examples from the base kernels and the optimized kernel are compared in two datasets involving satellite images and synthetic data where proposed approach produces better results. The second approach is an unsupervised method to learn a linear combination of kernel functions

    Soccer event detection via collaborative multimodal feature analysis and candidate ranking

    Get PDF
    This paper presents a framework for soccer event detection through collaborative analysis of the textual, visual and aural modalities. The basic notion is to decompose a match video into smaller segments until ultimately the desired eventful segment is identified. Simple features are considered namely the minute-by-minute reports from sports websites (i.e. text), the semantic shot classes of far and closeup-views (i.e. visual), and the low-level features of pitch and log-energy (i.e. audio). The framework demonstrates that despite considering simple features, and by averting the use of labeled training examples, event detection can be achieved at very high accuracy. Experiments conducted on ~30-hours of soccer video show very promising results for the detection of goals, penalties, yellow cards and red cards

    Unshuffling Data for Improved Generalization

    Get PDF
    Generalization beyond the training distribution is a core challenge in machine learning. The common practice of mixing and shuffling examples when training neural networks may not be optimal in this regard. We show that partitioning the data into well-chosen, non-i.i.d. subsets treated as multiple training environments can guide the learning of models with better out-of-distribution generalization. We describe a training procedure to capture the patterns that are stable across environments while discarding spurious ones. The method makes a step beyond correlation-based learning: the choice of the partitioning allows injecting information about the task that cannot be otherwise recovered from the joint distribution of the training data. We demonstrate multiple use cases with the task of visual question answering, which is notorious for dataset biases. We obtain significant improvements on VQA-CP, using environments built from prior knowledge, existing meta data, or unsupervised clustering. We also get improvements on GQA using annotations of "equivalent questions", and on multi-dataset training (VQA v2 / Visual Genome) by treating them as distinct environments
    corecore